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EXTRACT A DATABASE RECORD FROM A 
STRUCTURED LITERATURE DATABASE 



PARSE THE DATABASE RECORD TO EXTRACT ONE 

OR MORE INDIVIDUAL INFORMATION FIELDS 
INCLUDING A SET OF CHEMICAL OR BIOLOGICAL 
MOLECULE NAMES 



I 



FILTER THE EXTRACTED SET OF CHEMICAL OR 
BIOLOGICAL MOLECULE NAMES TO CREATE A 
FILTERED SET OF CHEMICAL OR BIOLOGICAL 
MOLECULE NAMES 



FILTERED SET 
STORED 
IN AN INFERENCE 
DATABASE? 



-NO- 



YES 



TO B ' 
FIG. 2B, 




STORE ANY NEW CHEMICAL OR BIOLOGICAL 
MOLECULE NAMES FROM THE FILTERED SET IN 
THE IN THE INFERENCE DATABASE AND SET A 

CO-OCCURRENCE COUNT TO A START VALUE 
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INCREMENT CO-OCCURRENCE COUNTS FOR 
PAIRS OF CHEMICAL OR BIOLOGICAL MOLECULE 
NAMES IN THE INFERENCE DATABASE THAT 
CO-OCCUR 
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CONSTRUCT AN OPTIONAL CONNECTION 
NETWORK USING ONE OR MORE DATABASE 
RECORDS FROM THE INFERENCE DATABASE 



I 
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APPLY ONE OR MORE ANALYSIS METHODS TO 
DETERMINE POSSIBLE INFERENCES REGARDING 
CHEMICAL OR BIOLOGICAL MOLECULES 
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GENERATE AUTOMATICALLY ONE OR MORE 
INFERENCES REGARDING CHEMICAL OR 
BIOLOGICAL MOLECULES 



^ END ^ 
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FIG. 3 
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AU - MARTINEZ R 

AU - EDWARDS CA 

Tl - EXPRESSION, PURIFICATION 
AND FUNCTIONAL 
CHARACTERIZATION OF THE 
DNA-BINDING DOMAIN OF THE 
HERPES SIMPLEX VIRUS TYPE 1 
UL9 PROTEIN. 

RN - EC 3.4.21.5 (THROMBIN) 

RN - 0 (VIRAL PROTEINS) 

RN - 115004-77-8- (HERPES SIMPLEX 
VIRUS TYPE 1 PROTEIN UL9) 

RN - 9007-49-2 (DNA) 
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THROMBIN 
VIRAL PROTEINS 
HERPES SIMPLEX VIRUS TYPE 1 
PROTEIN UL9 

DNA 
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THROMBIN 

HERPES SIMPLEX VIRUS TYPE 1 
PROTEIN UL9 

DNA 



FILTER 
WORDS: 
VIRAL 
PROTEINS 

*\ 
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INFERENCE DATABASE 



CO-OCCURRENCE COUNTS 



ID NAME 

1 THROMBIN 

2 HERPES SIMPLEX. 

3 DNA 



ID1 


ID2 


COUNT 


1 


2 


12 


1 


3 


4 


2 


3 
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INFERENCE: HERPES SIMPLEX VIRUS TYPE 1 
PROTEIN UL9 INTERACTS WITH DNA 
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FIG. 4 
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CREATING A CONNECTION NETWORK FROM AN 
INFERENCE DATABASE, WHERE THE CONNECTION 

NETWORK INCLUDES TWO OR MORE NODES 
CONNECTED BY ONE OR MORE ARCS, WHERE THE 
ONE OR MORE ARCS REPRESENTS CO- 
OCCURRENCES BETWEEN CHEMICAL OR 
BIOLOGICAL MOLECULES, WHERE THE 
INFERENCE DATABASE INCLUDES DATABASE 
RECORDS WITH ONE OR MORE INFERENCE 
ASSOCIATIONS 



I 
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APPLY ONE OR MORE ANALYSIS METHODS TO 
THE CONNECTION NETWORK TO DETERMINE ANY 
TRIVIAL INFERENCE ASSOCIATIONS 
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DELETE DATABASE RECORDS FROM THE 
INFERENCE DATABASE DETERMINED TO INCLUDE 
TRIVIAL INFERENCE ASSOCIATIONS, THEREBY 
IMPROVING INFERENCE KNOWLEDGE IN THE 
INFERENCE DATABASE 



Q END ^ 
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FIG. 5 
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EXTRACT TWO OR MORE CHEMICAL OR 
BIOLOGICAL MOLECULES NAMES FROM A 
DATABASE RECORD FROM AN INFERENCE 

DATABASE FOR A FIRST CHEMICAL OR 
BIOLOGICAL MOLECULE-A AND A SECOND 
CHEMICAL OR BIOLOGICAL MOLECULE-B. THE 
INFERENCE DATABASE INCLUDES A PLURALITY 98 
OF INFERENCE DATABASE RECORDS CREATED 
FROM AN INDEXED LITERATURE DATABASE 

* ~ 

DETERMINE A LIKELIHOOD STATISTIC FOR A CO- 
OCCURRENCE BETWEEN A FIRST CHEMICAL OR 

BIOLOGICAL MOLECULE NAME-A AND A SECOND 
CHEMICAL OR BIOLOGICAL MOLECULE NAME-B 1 00 
EXTRACTED FROM THE DATABASE RECORD 



APPLY THE LIKELIHOOD STATISTIC TO 
DETERMINE IF THE CO-OCCURRENCE BETWEEN 

THE FIRST CHEMICAL OR BIOLOGICAL 
MOLECULE-A AND THE SECOND CHEMICAL OR 
BIOLOGICAL MOLECULE-B IS A NON-TRIVIAL CO- 1 02 
OCCURRENCE REFLECTING PHYSICO-CHEMICAL 
INTERACTIONS 
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SELECT A TARGET NODE FROM A FIRST LIST OF 
NODES CONNECTED BY A PLURALITY OF ARCS IN 

A CONNECTION NETWORK, WHERE THE 
CONNECTION NETWORK INCLUDES ONE OR MORE 
NODES REPRESENTING ONE OR MORE CHEMICAL 
OR BIOLOGICAL MOLECULES NAMES AND ONE OR 
MORE ARCS CONNECTING THE ONE OR MORE 
NODES IN A PRE-DETERMINED ORDER, AND 
WHERE THE ONE OR MORE ARCS REPRESENT CO- 
OCCURRENCE VALUES OF PHYSICO-CHEMICAL 
INTERACTIONS BETWEEN CHEMICAL OR 
BIOLOGICAL MOLECULES 



I 



CREATE A SECOND LIST OF NODES BY 
CONSIDERING SIMULTANEOUSLY ONE OR MORE 

OTHER NODES THAT ARE NEIGHBORS OF THE 
TARGET NODE AS WELL AS NEIGHBORS OF THE 
OTHER NODES IN THE PRE-DETERMINED ORDER 
IN THE CONNECTION NETWORK 
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SELECT A NEXT NODE FROM THE SECOND LIST 
OF NODES USING THE CO-OCCURRENCE VALUES, 
WHERE THE NEXT NODE IS A MOST LIKELY NEXT 
NODE AFTER THE TARGET NODE IN THE PRE- 
DETERMINED ORDER FOR THE CONNECTION 
NETWORK BASED ON THE CO-OCCURRENCE 
VALUES 

(^~END~~J 
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SELECT A POSITION IN A CONNECTION NETWORK 
FOR AN UNKNOWN TARGET NODE FROM A FIRST 
LIST OF NODES CONNECTED BY ONE OR MORE 
ARCS. THE CONNECTION NETWORK INCLUDES 
ONE OR MORE NODES REPRESENTING ONE OR 
MORE CHEMICAL OR BIOLOGICAL MOLECULES 
NAMES AND ONE OR MORE ARCS CONNECTING 
THE ONE OR MORE NODES IN A PRE-DETERMINED 
ORDER. THE ONE OR MORE ARCS REPRESENT 

CO-OCCURRENCE VALUES OF PHYSICO- 
CHEMICAL INTERACTIONS BETWEEN CHEMICAL 
OR BIOLOGICAL MOLECULES. 
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DETERMINE A SECOND LIST OF NODES PRIOR TO 
THE POSITION OF THE UNKNOWN TARGET NODE 
IN THE CONNECTION NETWORK 



DETERMINE A THIRD LIST OF NODES 
SUBSEQUENT TO THE POSITION OF UNKNOWN 
TARGET NODE IN THE CONNECTION NETWORK 
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DETERMINE A FOURTH LIST OF NODES INCLUDED 
IN BOTH THE SECOND LIST OF NODES AND THE 
THIRD LIST OF NODES 



I 



DETERMINE AN IDENTITY FOR THE UNKNOWN 
TARGET NODE FROM THE FOURTH LIST OF NODES 
USING A LIKELIHOOD STATISTIC 
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FIG. 9 

(start) 



CONSTRUCT A CONNECTION NETWORK USING 
ONE OR MORE DATABASE RECORDS FROM AN 

INFERENCE DATABASE. THE CONNECTION 
NETWORK INCLUDES A ONE OR MORE NODES 
FOR CHEMICAL OR BIOLOGICAL MOLECULES AND 
BIOLOGICAL PROCESSES FOUND TO CO-OCCUR 
ONE OR MORE TIMES. THE ONE OR MORE NODES 
ARE CONNECTED BY ONE OR MORE ARCS IN A 

PRE-DETERMINED ORDER. THE INFERENCE 
DATABASE WAS CREATED FROM CHEMICAL OR 

BIOLOGICAL MOLECULE AND BIOLOGICAL 
PROCESS INFORMATION EXTRACTED FROM A 
STRUCTURED LITERATURE DATABASE. 
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APPLY ONE OR MORE LIKELIHOOD STATISTIC 
ANALYSIS METHODS TO THE CONNECTION 
NETWORK TO DETERMINE POSSIBLE INFERENCES 
BETWEEN THE CHEMICAL OR BIOLOGICAL 
MOLECULES AND A BIOLOGICAL PROCESS 
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GENERATE AUTOMATICALLY ONE OR MORE 
BIOLOGICAL INFERENCES BETWEEN CHEMICAL 
OR BIOLOGICAL MOLECULES AND A BIOLOGICAL 
PROCESS USING RESULTS FROM THE 
LIKELIHOOD STATISTIC ANALYSIS METHODS 



Q END 



